-
Notifications
You must be signed in to change notification settings - Fork 69
Add Tensor Layout verifier for DPAS layout #4339
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Can we move the verifier code to this function which is defined specific for DotOp intel-xpu-backend-for-triton/third_party/intel/lib/Dialect/TritonIntelGPU/IR/Dialect.cpp Line 555 in 5edbad5
This interface is used dedicate to verify the DotOp which is more efficient to reduce compiling time.:
|
@@ -957,6 +1018,7 @@ void TritonIntelGPUDialect::initialize() { | |||
>(); | |||
|
|||
addInterfaces<TritonIntelGPUInferLayoutInterface>(); | |||
addInterfaces<TritonIntelGPUVerifyTensorLayoutInterface>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an discussion with upstream Triton.
We want the third party dialect can use the TritonGPUVerify Interface as the parent class.
@LiyangLingIntel , Do you know what is the response of the upstream and what is the issue for the discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The verify tensor layout interface is called via
Dialect &dialect = layout.getDialect();
auto verifyLayoutInterface =
dyn_cast<mlir::triton::DialectVerifyTensorLayoutInterface>(&dialect);
if (verifyLayoutInterface) {
return verifyLayoutInterface->verifyTensorLayout(layout, rankedTy, op,
makeErr);
}
note that the dialect comes from the layout attribute and not the operation. Why would we need to call the Triton GPU dialect interface / use it as the parent class, when the layouts (attributes) it operates on are not part of our dialect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know what is the response of the upstream and what is the issue for the discussion.
It seems there is no further design update after that discussion.
Why would we need to call the Triton GPU dialect interface / use it as the parent class, when the layouts (attributes) it operates on are not part of our dialect?
As what I can recall, there were some cases that layouts from Triton GPU dialect would also go into Triton Intel GPU dialect verify/infer layout interface. The reason "use the TritonGPUVerify Interface as the parent class" is to reuse common code to reduce duplication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is some basic legal check implemented in the Triton GPU dialect interface which is valid for the third_party GPU dialect as well.
But right now those basic check is missed when to check the layout defined in third_party.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know about inferLayoutInterface
, but I am curious about verifyLayoutInterface
as the dialect comes directly from the layout, and as far as I know there are no layouts shared between dialects.
Now if the layout were a DotOperandEncoding
layout with parent from the Intel dialect I could see how that might pose a problem, as DotOperandEncoding
would never hit the Intel dialect verifier. But I don't understand how the opposite could be true.
As an aside, are we sure OpsPerChannel should be part of the layout? It is implicit in the tensor type of the arguments to the dot operation and could likely be derived given the A and B encodings. |
The But it gives better experience for reviewing the IR manually from my feeling. |
@chengjunlu does this duplicate #4469 ? |
I think it is not duplicated. The verifier in this PR is special to verify the combination of the scalar type and DotOp layout of the |
48f361e
to
c38da38
Compare
rebased + addressed comments |
%cst = arith.constant dense<0.000000e+00> : tensor<32x32xf32, #dpas> | ||
|
||
// expected-error @+1 {{Layout has opsPerChannel = 2 but tensor element type is 'f32'. Expected 16 bit type.}} | ||
%28 = tt.dot %a_mat, %b_mat, %cst, inputPrecision = tf32 : tensor<32x16xf32, #dot_operand_a> * tensor<16x32xf32, #dot_operand_b> -> tensor<32x32xf32, #dpas> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
%28 = tt.dot %a_mat, %b_mat, %cst, inputPrecision = tf32 : tensor<32x16xf32, #dot_operand_a> * tensor<16x32xf32, #dot_operand_b> -> tensor<32x32xf32, #dpas> | |
%28 = tt.dot %a_mat, %b_mat, %cst, inputPrecision = tf32 : tensor<32x16xf32, #dot_operand_a> * tensor<16x32xf16, #dot_operand_b> -> tensor<32x32xf32, #dpas> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just keep it the way it was - I am comfortable knowing it tests both permutations and I think the likelihood that they get changed is very low.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel strongly about this. Does the test fail with the suggested changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the original set of suggested changes caused a failure but I did not reproduce locally. It seemed safer and more efficient to use the original test which produces the expected results, and which I verified carefully by disabling validation on A and/or B.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just tried locally, and it works, it likely failed because line 172 is changed but line 176 is not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's leave it as is - I don't want to have to re-rest and the CI just finished.
e88ea9a
to
d947c54
Compare
Introduces a verifier to ensure the DPAS layout attached to a Dot operation has a suitable opsPerChannel param for the A and B operand inputs to the Dot op. Previously this verification was implicit in the Triton GEN verification, producing a somewhat cryptic error message (prior to #4276 there was no error message):
with this new verifier, the error message is more user friendly:
close #4270